AITopics | rmsprop 0

Collaborating Authors

rmsprop 0

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Appendices

Neural Information Processing SystemsFeb-9-2026, 07:07:08 GMT

And, for each of them, the second (final) stripe has 44 options. It could seem that small improvements in efficacy may have only a minor effect on final network accuracy, especially considering the noisiness inherent in large-scale training. Better thanreducing themagnitude oflostweights, though, iscompletely eliminating it - by using the zeros already present in the unstructured sparse weight matrix, it may be possible to find a permutation that does notloseanymagnitude after applying theN:M constraint.

artificial intelligence, machine learning, permutation, (17 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.31)

Add feedback

149ef6419512be56a93169cd5e6fa8fd-Supplemental.pdf

Neural Information Processing SystemsFeb-7-2026, 14:02:51 GMT

pmnist classification task, rmsprop 0, sequence, (9 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.50)

Add feedback

Appendix for Integrating Momentum into Recurrent Neural Networks

Neural Information Processing SystemsOct-2-2025, 04:41:06 GMT

Section 3.1, we flatten and process the image as a sequence of the length of 784 pixel-by-pixel. The baseline LSTM models consist of one LSTM cell with 128 and 256 hidden units. Orthogonal initialization is used for input-to-hidden weights, while hidden-to-hidden weights are initialized to identity matrices. The gradient norms are clipped to 1 during training. The log-magnitude of these sequences is fed into the models as the input data.

artificial intelligence, machine learning, rmsprop 0, (12 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Gradient Flow Matching for Learning Update Dynamics in Neural Network Training

Shou, Xiao, Ding, Yanna, Gao, Jianxi

arXiv.org Machine LearningMay-27-2025

Training deep neural networks remains computationally intensive due to the itera2 tive nature of gradient-based optimization. We propose Gradient Flow Matching (GFM), a continuous-time modeling framework that treats neural network training as a dynamical system governed by learned optimizer-aware vector fields. By leveraging conditional flow matching, GFM captures the underlying update rules of optimizers such as SGD, Adam, and RMSprop, enabling smooth extrapolation of weight trajectories toward convergence. Unlike black-box sequence models, GFM incorporates structural knowledge of gradient-based updates into the learning objective, facilitating accurate forecasting of final weights from partial training sequences. Empirically, GFM achieves forecasting accuracy that is competitive with Transformer-based models and significantly outperforms LSTM and other classical baselines. Furthermore, GFM generalizes across neural architectures and initializations, providing a unified framework for studying optimization dynamics and accelerating convergence prediction.

artificial intelligence, machine learning, trajectory, (18 more...)

arXiv.org Machine Learning

2505.20221

Country:

North America > United States > New York > Rensselaer County > Troy (0.04)
North America > United States > Texas > McLennan County > Waco (0.04)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
Asia > Myanmar > Tanintharyi Region > Dawei (0.04)

Genre: Research Report (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Understanding Optimization in Deep Learning with Central Flows

Cohen, Jeremy M., Damian, Alex, Talwalkar, Ameet, Kolter, Zico, Lee, Jason D.

arXiv.org Machine LearningOct-31-2024

Optimization in deep learning remains poorly understood, even in the simple setting of deterministic (i.e. full-batch) training. A key difficulty is that much of an optimizer's behavior is implicitly determined by complex oscillatory dynamics, referred to as the "edge of stability." The main contribution of this paper is to show that an optimizer's implicit behavior can be explicitly captured by a "central flow:" a differential equation which models the time-averaged optimization trajectory. We show that these flows can empirically predict long-term optimization trajectories of generic neural networks with a high degree of numerical accuracy. By interpreting these flows, we reveal for the first time 1) the precise sense in which RMSProp adapts to the local loss landscape, and 2) an "acceleration via regularization" mechanism, wherein adaptive optimizers implicitly navigate towards low-curvature regions in which they can take larger steps. This mechanism is key to the efficacy of these adaptive optimizers. Overall, we believe that central flows constitute a promising tool for reasoning about optimization in deep learning.

central flow 0, flow 0, stable flow 0, (12 more...)

arXiv.org Machine Learning

2410.24206

Country: North America > United States > California > San Diego County > San Diego (0.04)

Genre: Research Report (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Benchmark Analysis of Various Pre-trained Deep Learning Models on ASSIRA Cats and Dogs Dataset

Himel, Galib Muhammad Shahriar, Islam, Md. Masudul

arXiv.org Artificial IntelligenceJan-9-2024

As the most basic application and implementation of deep learning, image classification has grown in popularity. Various datasets are provided by renowned data science communities for benchmarking machine learning algorithms and pre-trained models. The ASSIRA Cats & Dogs dataset is one of them and is being used in this research for its overall acceptance and benchmark standards. A comparison of various pre-trained models is demonstrated by using different types of optimizers and loss functions. Hyper-parameters are changed to gain the best result from a model. By applying this approach, we have got higher accuracy without major changes in the training model. To run the experiment, we used three different computer architectures: a laptop equipped with NVIDIA GeForce GTX 1070, a laptop equipped with NVIDIA GeForce RTX 3080Ti, and a desktop equipped with NVIDIA GeForce RTX 3090. The acquired results demonstrate supremacy in terms of accuracy over the previously done experiments on this dataset. From this experiment, the highest accuracy which is 99.65% is gained using the NASNet Large.

accuracy, architecture, dataset, (16 more...)

arXiv.org Artificial Intelligence

2401.04666

Country:

Asia > Bangladesh > Dhaka Division > Dhaka District > Dhaka (0.04)
Europe > United Kingdom > England > Greater London > London (0.04)
Asia > Malaysia > Penang (0.04)
(3 more...)

Genre: Research Report > New Finding (0.48)

Industry: Information Technology (0.89)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Realistic mask generation for matter-wave lithography via machine learning

Fiedler, Johannes, Palau, Adrià Salvador, Osestad, Eivind Kristen, Parviainen, Pekka, Holst, Bodil

arXiv.org Artificial IntelligenceJul-15-2022

Fast production of large area patterns with nanometre resolution is crucial for the established semiconductor industry and for enabling industrial-scale production of next-generation quantum devices. Metastable atom lithography with binary holography masks has been suggested as a higher resolution/low-cost alternative to the current state of the art: extreme ultraviolet (EUV) lithography. However, it was recently shown that the interaction of the metastable atoms with the mask material (SiN) leads to a strong perturbation of the wavefront, not included in existing mask generation theory, which is based on classical scalar waves. This means that the inverse problem (creating a mask based on the desired pattern) cannot be solved analytically even in 1D. Here we present a machine learning approach to mask generation targeted for metastable atoms. Our algorithm uses a combination of genetic optimisation and deep learning to obtain the mask. A novel deep neural architecture is trained to produce an initial approximation of the mask. This approximation is then used to generate the initial population of the genetic optimisation algorithm that can converge to arbitrary precision. We demonstrate the generation of arbitrary 1D patterns for system dimensions within the Fraunhofer approximation limit.

artificial intelligence, evolutionary algorithm, machine learning, (18 more...)

arXiv.org Artificial Intelligence

2207.08723

Country:

Oceania > Palau (0.14)
Europe > Norway > Western Norway > Vestland > Bergen (0.04)
North America > United States > Oklahoma > Beaver County (0.04)
(4 more...)

Genre: Research Report (0.64)

Industry:

Semiconductors & Electronics (0.54)
Energy > Renewable (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Evolutionary Systems (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.88)

Add feedback

ConformalLayers: A non-linear sequential neural network with associative layers

Sousa, Eduardo Vera, Fernandes, Leandro A. F., Vasconcelos, Cristina Nader

arXiv.org Artificial IntelligenceOct-22-2021

Convolutional Neural Networks (CNNs) have been widely applied. But as the CNNs grow, the number of arithmetic operations and memory footprint also increase. Furthermore, typical non-linear activation functions do not allow associativity of the operations encoded by consecutive layers, preventing the simplification of intermediate steps by combining them. We present a new activation function that allows associativity between sequential layers of CNNs. Even though our activation function is non-linear, it can be represented by a sequence of linear operations in the conformal model for Euclidean geometry. In this domain, operations like, but not limited to, convolution, average pooling, and dropout remain linear. We take advantage of associativity to combine all the "conformal layers" and make the cost of inference constant regardless of the depth of the network.

activation function, conformallayer, neural network, (16 more...)

arXiv.org Artificial Intelligence

2110.12108

Country:

North America > Canada > Ontario > Toronto (0.14)
South America > Brazil > Rio de Janeiro > Rio de Janeiro (0.04)
South America > Brazil > Rio de Janeiro > Niterói (0.04)
North America > United States (0.04)

Genre: Research Report (0.64)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.49)

Add feedback

Classification and Feature Transformation with Fuzzy Cognitive Maps

Szwed, Piotr

arXiv.org Artificial IntelligenceMar-8-2021

Fuzzy Cognitive Maps (FCMs) are considered a soft computing technique combining elements of fuzzy logic and recurrent neural networks. They found multiple application in such domains as modeling of system behavior, prediction of time series, decision making and process control. Less attention, however, has been turned towards using them in pattern classification. In this work we propose an FCM based classifier with a fully connected map structure. In contrast to methods that expect reaching a steady system state during reasoning, we chose to execute a few FCM iterations (steps) before collecting output labels. Weights were learned with a gradient algorithm and logloss or cross-entropy were used as the cost function. Our primary goal was to verify, whether such design would result in a descent general purpose classifier, with performance comparable to off the shelf classical methods. As the preliminary results were promising, we investigated the hypothesis that the performance of $d$-step classifier can be attributed to a fact that in previous $d-1$ steps it transforms the feature space by grouping observations belonging to a given class, so that they became more compact and separable. To verify this hypothesis we calculated three clustering scores for the transformed feature space. We also evaluated performance of pipelines built from FCM-based data transformer followed by a classification algorithm. The standard statistical analyzes confirmed both the performance of FCM based classifier and its capability to improve data. The supporting prototype software was implemented in Python using TensorFlow library.

algorithm, classifier, dataset, (16 more...)

arXiv.org Artificial Intelligence

doi: 10.1016/j.asoc.2021.107271

2103.05124

Country:

Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Europe > Poland > Lesser Poland Province > Kraków (0.04)

Genre: Research Report (1.00)

Industry: Health & Medicine > Therapeutic Area (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Fuzzy Logic (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.67)

Add feedback

Are we Forgetting about Compositional Optimisers in Bayesian Optimisation?

Grosnit, Antoine, Cowen-Rivers, Alexander I., Tutunov, Rasul, Griffiths, Ryan-Rhys, Wang, Jun, Bou-Ammar, Haitham

arXiv.org Machine LearningDec-17-2020

Bayesian optimisation presents a sample-efficient methodology for global optimisation. Within this framework, a crucial performance-determining subroutine is the maximisation of the acquisition function, a task complicated by the fact that acquisition functions tend to be non-convex and thus nontrivial to optimise. In this paper, we undertake a comprehensive empirical study of approaches to maximise the acquisition function. Additionally, by deriving novel, yet mathematically equivalent, compositional forms for popular acquisition functions, we recast the maximisation task as a compositional optimisation problem, allowing us to benefit from the extensive literature in this field. We highlight the empirical advantages of the compositional approach to acquisition function maximisation across 3958 individual experiments comprising synthetic optimisation tasks as well as tasks from Bayesmark. Given the generality of the acquisition function maximisation subroutine, we posit that the adoption of compositional optimisers has the potential to yield performance improvements across all domains in which Bayesian optimisation is currently being applied.

acquisition function, deep learning, neural network, (22 more...)

arXiv.org Machine Learning

2012.0824

Country:

Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.14)
North America > United States > Massachusetts (0.14)
North America > Canada (0.14)
Asia > Middle East > Qatar (0.14)

Genre:

Research Report > New Finding (0.46)
Instructional Material > Course Syllabus & Notes (0.45)

Industry:

Health & Medicine (0.96)
Energy > Oil & Gas (0.67)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Evolutionary Systems (1.00)
(6 more...)

Add feedback